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^L{, Abstract 

■ The compression-complexity trade-off of lossy compression algorithms that are based on a ran- 
dom codebook or a random database is examined. Motivated, in part, by recent results of Gupta- 

. Verdii-Weissman (GVW) and their underlying connections with the pattern-matching scheme of 

Kontoyiannis' lossy Lempel-Ziv algorithm, we introduce a non- universal version of the lossy Lempel- 
Ziv method (termed LLZ). The optimality of LLZ for memoryless sources is established, and its 
performance is compared to that of the GVW divide-and-conquer approach. Experimental results 
. indicate that the GVW approach often yields better compression than LLZ, but at the price of 

much higher memory requirements. To combine the advantages of both, we introduce a hybrid 
algorithm (HYB) that utilizes both the divide-and-conquer idea of GVW and the single-database 
structure of LLZ. It is proved that HYB shares with GVW the exact same rate-distortion perfor- 
. mance and implementation complexity, while, like LLZ, requiring less memory, by a factor which 

' may become unbounded, depending on the choice or the relevant design parameters. Experimental 

. results are also presented, illustrating the performance of all three methods on data generated by 

CO ' simple discrete memoryless sources. In particular, the HYB algorithm is shown to outperform 

. existing schemes for the compression of some simple discrete sources with respect to the Hamming 

' distortion criterion. 

0^■ 

' Keywords — Lossy data compression, rate-distortion theory, pattern matching, Lempel-Ziv, random code- 
book, fixed database 

X. 

■ 1 Introduction 

One of the last major outstanding classical problems of information theory is the development of 
general-purpose, practical, efficiently implementable lossy compression algorithms. The corresponding 
problem for lossless data compression was essentially settled in the late 1970s by the advance of the 
Lenipel-Ziv (LZ) family of algorithms 58] 5^ 5^ and arithmetic coding 43| 3^ 26]; see also the texts 



2o| 3]. Similarly, from the early- to mid-1990s on, efficient channel coding strategies emerged that 
perform close to capacity, primaril y using sp arse graph codes, turbo codes, and local message-passing 
decoding algorithms; see, e.g., 0BI3B, the texts [HSliS], and the references therein. 
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For lossy data compression, although there is a rich and varied literature on both theoretical results 
and practical compression schemes, near-optimal, efficiently implementable algorithms are yet to be 



discovered. From rate-distortion theory [a] 43|] we know that it is possible to achieve a sometimes 



dramatic improvement in compression performance by allowing for a certain amount of distortion in 
the reconstructed data. But the majority of existing algorithms are either compression-suboptimal or 
they involve exhaustive searches of exponential complexity at the encoder, making them unsuitable 
for realistic practical implementation. 

Until the late 1990s, most of the research effort was devoted to addressing the issue of universality, 
see [2^ and the references therein, as well as [55|(3?] fsT^ fs^ 36] 13] 54| 49]; algorithms emphasizing 
more practical aspects have been proposed in [5l|]. In addition to many application-specific families 
of compression standards (e.g., JPEG for images and MPEG for video), there is a general theory 
of algorithm design based on vector quantization; see 3] 27] 7][l7] and the references therein. Yet 
another line of research, closer in spirit to the present work, is on lossy extensions of the celebrated 
Lempel-Ziv schemes, based on approximate pattern matching; see [35] [45] [52] [28] [2] [50] [3] [13] [3] [it]- 

More recently, there has been renewed interest in the compression-complexity trade-off, and in the 
development of low-complexity compressors that give near-optimal performance, at least for simple 
sources with known statistics. The lossy LZ algorithm of 2j] is rate-distortion optimal and of polyno- 
mial complexity, although, in part due the penalty paid for universality, its convergence is slow. For 
the uniform Bernoulli source, HHlii] present codes based on sparse graphs, and, although their 
performance is promising, like earlier approaches they rely on exponential searches at the encoder. In 
related work, 46] ^ present sparse-graph compression schemes with much more attractive complexity 
characteristics, but suboptimal compression performance. Rissanen and Tabus [ilj describe a different 
method which, unlike most of the earlier approaches, is not based on a random (or otherwise exponen- 
tially large) codebook. It has linear complexity in the encoder and decoder and, although it appears 
to be rate-distortion suboptimal, it is an effective practical scheme for Bernoulli sources. Sparse- 



graph codes that are compression-optimal and of subexponential complexity are constructed in 181 ]. 
A simulation-based iterative algorithm is presented in 24] and it is shown to be compression-optimal, 
although its complexity is hard to evaluate preciselyas it depends on the convergence of a Markov 
chain Monte Carlo sampler. The more recent work [211] on the lossy compression of discrete Markov 
sources also contains promising results; it is based on the combination of a Viterbi-like optimization 
algorithm at the encoder followed by universal lossless compression. 

The present work is partly motivated by the results reported in [l^ by Gupta- Verdii-Weissman 
(GVW). Their compression schemes are based on the "divide-and-conquer" approach, namely the idea 
that instead of encoding a long message = {xi,X2, ■ ■ ■ ,Xn) using a classical random codebook of 
blocklength n, it is preferable to break up into shorter sub-blocks of shorter length i, say, and 
encode the sub-blocks separately. The main results in [l^ state that, with an appropriately chosen 
sub-block length i, it is possible to achieve asymptotically optimal rate-distortion performance with 
near-linear implementation complexity (in a sense made precise in Section [3] below). 

Our starting point is the observation that there is a closely related, in a sense dual, point of view. 
On a conceptual as well as mathematical level, the divide-and-conquer approach is very closely related 
to a pattern-matching scheme with a restricted database. In the divide-and-conquer setting, given a 
target distortion level D and an ^ > 1, each sub-block of length i in the original message is encoded 
using a random codebook consisting of ~ 2^^^^^ codewords, where R{D) is the rate-distortion function 
of the source being compressed (see the following section for more details and rigorous definitions). 
To encode each sub-block, the encoder searches all 2^^^^^ entries of the codebook, in order to find the 
one which has the smallest distortion with respect to that sub-block. 

Now suppose that, instead of a random codebook, the encoder and decoder share a random 
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database with length M 2^^(^\ generated from the same distribution as the Shannon-optimal 
codebook. As in 2j], the encoder searches for the longest prefix = (xi, X2, . . . , x^) of the message 



x" that matches somewhere in the database with distortion D or less. Then the prefix xf is described 
to the decoder by describing the position and length of the match in the database, and the same 
process is repeated inductively starting at xl+i- Although the match- length L is random, we know 
1^ 24] that, asymptotically, it behaves like, 



logM 

L ^^^^ i, with high probability. 

Therefore, because the length M of the database was chosen to be ~ 2^^^^\ in effect both schemes 
will individually encode sub-blocks of approximately the same length i, and will also have comparable 
implementation complexity at the encoder^ 

Thus motivated, after reviewing the GVW scheme in Section [2] we introduce a (non-universal) 
version of the lossy LZ scheme in j24|, termed LLZ, and we compare its performance to that of GVW. 
Theorem 1 shows that LLZ is asymptotically optimal in the rate-distortion sense for compressing data 
from a known discrete memoryless source with respect to a single-letter distortion criterion. Simulation 
results are also presented, comparing the performance of LLZ and GVW on a simple Bernoulli source. 
These results indicate that for blocklengths around 1000 bits, GVW offers better compression than 
LLZ at a given distortion level, but it requires significantly more memory for its execution. [The same 
findings are also confirmed in the other simulation examples presented in Section HI] 

In order to combine the different advantages of the two schemes, in Section[3]we introduce a hybrid 
algorithm (HYB), which utilizes both the divide-and-conquer idea of GVW and the single-database 
structure of LLZ. In Theorems 2 and 3 we prove that HYB shares with GVW the exact same rate- 
distortion performance and implementation complexity, in that it operates in near-linear time at the 
encoder and linear time at the decoder. Moreover, like LLZ, the HYB scheme requires much less 
memory, by an unbounded factor, depending on the choice of parameters in the design of the two 
algorithms. Experimental results are presented in Section HI comparing the performance of GVW and 
HYB. These confirm the theoretical findings, and indicate that HYB outperforms existing schemes 
for the compression of some simple discrete sources with respect to the Hamming distortion criterion. 
The earlier theoretical results stating that HYB's rate-distortion performance is the same as GVW's 
are confirmed empirically, and it is also shown that, again for blocklengths of approximately 1000 
symbols, the HYB scheme requires much less memory, by a factor ranging between 15 and 240. 

After a brief discussion on potential extensions of the present results, some conclusions are collected 
in Section [5l The appendix contains the proofs of the theorems in Sections [2] and [3l 



2 The GVW and LLZ algorithms 

After describing the basic setting within which all later results will be developed, in Section 12.21 we 
recall the divide-and-conquer idea of the GVW scheme, and in Section 12.31 we present a new, non- 
universal lossy LZ algorithm and examine its properties. 



'^It is well-known that the main difficulty in designing effective lossy compressors is in the implementation complexity 
of the encoder. Therefore, in all subsequent results dealing with complexity issues we focus on the case of the encoder. 
Moreover, it is easy to see that the decoding complexity of all the schemes considered here is linear in the message length. 
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2.1 The setting 



Let = {Xi, X2, ■ ■ ■} be a memoryless source on some finite alphabet A and suppose that its 

distribution is described by a known probabiHty mass function P on A. The objective is to compress 
{Xn} with respect to a sequence of single-letter distortion criteria, 



1 " 

Pn{xi,yi) = -'^p{xi,yi), n>l, 



where = {xi,X2, ■ ■ ■ , Xn) £ A'^ is an arbitrary source string to be compressed, = {yi,y2, ■ ■ ■ , yn) 
is a potential reproduction string taking values in a finite reproduction alphabet A, and p : A x A ^ 
[0, 00) is an arbitrary distortion measure. We make the customary assumption that for any source 
letter x there is a reproduction letter y with zero distortion, 

max min p{x, y) = 0. 

The best achievable rate at which data from the source can be compressed with distortion 

not exceeding D > is given by the rate- distortion function 43] 



R{D) = inf I{X;Y), (1) 

W{y\xy.T.,^yeA P{x)Wiy\x)p{x,y)<D 

where I{X; Y) denotes the mutual information between a random variable X with the same distribu- 
tion P as the source and a random variable Y with conditional distribution H^(-|a;) given X = x I Let 
-Cmax = ^^^y^A ^p[pi-Xj y)]' ™ order to avoid the trivial case where R{D) is identically equal to zero, 
-Cmax is assumed to be strictly positive. It is well-known and easy to check that, for all distortion 
values in the nontrivial range < L> < -Dmaxi there is a conditional distribution that achieves 

the infimum in ([T]), and this induces a distribution Q* on A via Q*{y) = X^xeA ^ 
y E A. With a slight abuse of terminology (as Q* may not be unique) we refer to Q* as the optimal re- 
production distribution at distortion level D. Recall also the analogous definition of the distortion-rate 
function D{R) of the source; cf. [6] [9]. 



2.2 The GVW algorithm 

The GVW algorithrrd is a fixed-rate, variable-distortion code of blocklength n and target distortion 
D £{0,D 

max)- It is described in terms of two parameters; a "small" 7 > 0, and an integer £ so that 

n = ki. 

Given the target distortion level D, let R = R{D) + 7, and take, 

D = R-^ (^R{D) + 7/2) = D (^R{D) + 7/2) < D. (2) 

First a fixed-rate code of blocklength i and rate R is created according to Shannon's classical random 
codebook construction. Letting Q* denote the optimal reproduction distribution at level D, the 
codebook consists of [2^^J i.i.d. codewords of length £, each generated i.i.d. from Q*. Writing 
x'l = x{* xf-j^-y * • • • * as the concatenation of k sub-blocks, each sub-block is matched to its 

^The mutual information, rate-distortion function, and all other standard information-theoretic quantities here and 
throughout are expressed in bits; all logarithms are taken to be in base 2, unless stated otherwise. 

^To be more precise, this is one of two closely related schemes discussed in [l^; see the relevant comments in Section|3l 
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/9^-nearest neighbor in the codebook, and it is described to the decoder using [log[2^-^J] ^£R bits to 
describe the index of that nearest neighbor in the codebook. 

This code is used k times, once on each of the k sub-blocks, to produce corresponding reconstruction 
strings y^(^l_iy_^_i, for i = 1,2, . . . ,k. The description of is the concatenation of the descriptions of the 
individual sub-blocks, and the reconstruction string itself is the concatenation of the corresponding 
reproduction blocks, y" = * yj^i * • • • * y^[_iy_f_i- The overall description length of this code is 
A; [log ] < k£R = nR bits, so the (fixed) rate of this code is < ii bits/symbol, and its (variable) 
distortion is pn(x",y"). 



2.3 The lossy Lempel-Ziv algorithm LLZ 

The LLZ algorithm described here can be seen as a simplified (in that it is non-universal) and modified 
(to facilitate the comparison below) version of the algorithm in 2J]. It is a fixed-distortion, variable- 
rate code of blocklength n, described in terms of three parameters; an integer blocklength i < n, and 
"small" The algorithm will be presented in a setting "dual" to that of the GVW algorithm, 

in the sense that was described in the Introduction. The main difference is that the source sting 
will be parsed into substrings of variable length, not of fixed length £. 
Given n and a target distortion level D, define R = R{D) + 7, take, 

D = R-^(r{D) --f/2) = d(r{D)--^/2] > D, 



and let Q* denote the optimal reproduction distribution at level D. Then generate a single i.i.d. 
database = {Yi,Y2, . . . , Ym) of length, 

m = m{e) = [2^^J +e-l, (3) 

and make it available to both the encoder and decoder. 

The encoding algorithm is as follows: The encoder calculates the length of the longest match (up 
to (l-|-a)£-many symbols) of an initial portion of the message Xi, within distortion D, in the database. 
Let Li^i denote the length of this longest match, 

Li^i = max{l <k<{l + a)e : pfc(a;^, y/+'="^) < D for some 1 <i <m - k + 1}, 

and let Z^^^ = x^^'^ denote the initial phrase of length 1 in x". Then the encoder describes to the 
decoder: 

(a) the length Li^i; this takes [log((l -|- a)i)~\ bits; 

(b) the position i in the database where the match occurs; this takes [log m] bits. 

From (a) and (6) the decoder can recover the string Z^^^ = y^^^^'^ ^, which is within distortion D 
of ZW. 

Alternatively, Z^^^ can be described with zero distortion by first describing its length Li i as before, 
and then describing Z^^^ itself directly using, 

[L,,ilog|in bits. (4) 



''Note that in [2J| a fixed-rate, variable-distortion universal code is also described, but we restrict attention here to 
the conceptually simpler fixed-distortion algorithm. 
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The encoder uses whichever one of the two descriptions is shorter. [Note that is not necessary to add a 
flag to indicate which one was chosen; the decoder can simply check if [L^^i log 1^41] is larger or smaller 
than [logm].] Therefore, from (a), (6), and (jlj) the length of the description of Z^^^ is, 



[log((l + a)i)] + min{ [log m] , [L^^i log \A\] } bits. 



(5) 



After Z^^) has been described within distortion D, the same process is repeated to encode the rest 
of the message: The encoder finds the length 2 of the longest string starting at position (L^ 1 + 1) in 

x'l that matches within distortion D into the database, and describes Z^"^^ = x^^'^^^^'^ to the decoder 



by repeating the above steps. The algorithm is terminated, in the natural way, when the entire string 
has been exhausted. At that point, x" has been parsed into = Il£(x^,D) distinct phrases Z^^\ 
each of length Li^^, x" = Z^-^^ * Z^"^^ * • • • * Z^^^\ with the possible exception of the last phrase, which 
may be shorter. Since each substring Z^'^^ is described within distortion D, also the concatenation of 
all the reproduction strings, call it ip^ := Z^^^ * Z^^) * • • • * Z^^'\ will be within distortion D of , 



'1 • 



The distortion achieved by this code is Pn(x", i^i), and it is guaranteed to be < D by construction. 
Regarding the rate, if we write A(x") = A(x",£, D) for the overall description length of Xi, then from 



A(x?) 



Tie 

E 

fe=i 



[[log((l+aK)] +min{[logm], [L^,fclog|in} 



bits. 



(6) 



and the rate achieved by this code is A(x")/n bits/symbol. 

Remark. As mentioned in the Introduction, there are two main differences between the GVW 
algorithm and the LLZ scheme. The first one is that while the GVW is based on a Shannon-style 
random codebook, the LLZ uses an LZ-type random database. The second is that GVW divides 
up the message x" into fixed-length sub-blocks of size i, whereas LLZ parses x^ into variable-length 
strings of (random) lengths L^^fc. But there is also an important point of solidarity between the two 
algorithms. Recall Theorem 23] that, for large £, the match length Li^i behaves logarithmically 
in the size of the database; that is, with high probability. 



log m{i) 
R(D) 



where the second approximation follows by the choice of m{£) and of D. Therefore, both algorithms 
end up parsing the message x" into sub-blocks of length ~ i symbols. 

Our first result shows that LLZ is as ymp totically optimal in the usual sense established for fixed- 
database versions of LZ-like schemes; see 48|] 24]. Specifically, it is shown that by taking i large enough 



and 7 small enough, the LLZ comes arbitrarily close to any optimal rate-distortion point {R{D),D). 
Note that a > is a parameter that simply controls the complexity of the best-match search, and its 
influence on the rate-distortion performance is asymptotically irrelevant. 

Theorem 1. [LLZ Optimality] Suppose the LLZ with parameters £, a and 7 is used to compress a 
memoryless source {^n} with rate-distortion function R{D) at a target distortion rate D £ (0, -Dmax)- 
For any 5 > 0, the parameter 7 > can be chosen small enough such that: 

(a) For any choice of ^ and any blocklength n, the distortion achieved by LLZ is no greater than D + 5. 

(b) Taking i large enough, the asymptotic rate of LLZ achieves the rate-distortion bound, in that. 



lim sup lim sup E < — ^ ' 

£-^00 n^oo t 



n 



Xi \ < R{D) = R{D) - 7/2 bits/symbol, w.p.l. 



(7) 
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where the expectation is over all databases. Therefore, also, 

lim suplim sup ^ / Ai^uA^ I < r(d) = R{D) - 7/2 bits/symbol, (8) 

with the expectation here being over both the message and the databases. 

Next, the performance of LLZ is compared with that of GVW on data generated from a Bernoulli 
source with parameter p = 0.4 and with respect to Hamming distortion. Simulation results at different 
target distortions are shown in Figure [1] and Table [U see Section U] for details on the choice of 
parameter values. It is clear from these results that, at the same distortion level, the GVW algorithm 
typically gives a better rate than LLZ. In terms of implementation complexity, the two algorithms 
have comparable execution times, but the LLZ uses significantly less memory. The same pattern - 
GVW giving better compression but using much more memory than LLZ - is also confirmed in the 
other examples we consider in Section HI 

Note that, like for the case of GVW, more can be said about the implementation complexity of 
LLZ and how it depends on the exact choice of parameters ^,a and 7. But since, as we will see next, 
the performance of both algorithms is dominated by that of a different algorithm (HYB), we do not 
pursue this direction further. 




Figure 1: Comparison of the rate-distortion performance of LLZ vs. that of GVW, on a data string of length n = 1050 
bits generated from a Bernoulli source with parameter p — 0.4. The solid line is the rate-distortion function, the 
rate-distortion pairs achieved by LLZ are shown as red stars and those of GVW as blue diamonds. 
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Bern(0.4) source, Hamming distortion 




Performance parameters 


Algorithm 


^target 


-^achieved 


rate 


memory 


time 


GVW 


0.05 


0.07143 


0.70095 


26MB 


27m53s 


GVW 


0.08 


0.10286 


0.59143 


23MB 


21mlls 
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(J.li 
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27 NIB 
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18m48s 


LjV W 


U.z 
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0.26286 


4dMJ3 


19ml8s 
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0.2ci 


U. 22857 
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57MJ3 


18m42s 


uv w 


U.2u 


(1 •1/ "' •"> O "1 


n 1 o •> o 

IJ.1-J333 


~ni\ r"n> 
/ !:Ji\iJJ 


lymlos 


GVW 


0.29 


0.31429 


0.1U952 


113MB 


OA OA^ 

20m29s 


LLZ 


0.05 


0.03238 


1.00029 


1.5MB 


4m23s 


LLZ 


0.08 


0.07524 


0.79129 


1.28MB 


6ml5s 


LLZ 


0.11 


0.10571 


0.6754 


1.46MB 


8m53s 


LLZ 


0.14 


0.1381 


0.55171 


1.69MB 


llmlSs 


LLZ 


0.17 


0.16952 


0.41827 


2.6MB 


18ml5s 


LLZ 


0.2 


0.2019 


0.36381 


3.6MB 


20m09s 


LLZ 


0.23 


0.23333 


0.27975 


6.2MB 


41m32s 


LLZ 


0.26 


0.26571 


0.23102 


13MB 


63m56s 


LLZ 


0.29 


0.29714 


0.1741 


47MB 


165m54s 



Table 1: Comparison of the performance of LLZ vs. that of GVW on a data string of length n = 1050 bits generated 
from a Bernoulli source with parameter p = 0.4. 
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3 The HYB algorithm 



In order to combine the rate-distortion advantage of GVW with the memory advantage of LLZ, in 
this section we introduce a hybrid algorithm and examine its performance. 

The new algorithm, termed HYB, uses the divide-and-conquer approach of GVW, but based on a 
random database like the LLZ instead of a random codebook. It is a fixed-rate, variable-distortion code 
of blocklength n and target distortion D G (0, -Dmax)) and it is described in terms of two parameters; 
a "small" 7 > 0, and an integer £ so that n = ki. 

Like with the GVW, given a target distortion level D, let R = R{D) + j and take D as in ([2]). Now, 
like for the LLZ algorithm, let m = m{£) = [2^^J + £ — 1 as in 1^, and generate a random database 
17" = (Yi,Y2, . . . ,Yra), where the Yi are drawn i.i.d. from the optimal reproduction distribution at 
level D. The database is made available to both the encoder and the decoder, and the message to 
be compressed is parsed into k = njl non-overlapping blocks, = * * • • • * 

The first sub-block is matched to its /O^-nearest neighbor in the database, where we consider 
each possible i = 1, 2, . . . , [2^^J as a potential reproduction word. Then x\ is described to the 

decoder by describing the position of its matching reproduction block in the database using ~ iR bits, 
and the same process is repeated on each of the k sub-blocks, to produce k reconstruction strings. 
The description of x" is the concatenation of the descriptions of the individual sub-blocks, and the 
reconstruction string itself is the concatenation of the corresponding reproduction blocks. The overall 
description length of this code is fc[log[2^^J] < klR = nR bits. 

The following result shows that the HYB algorithm shares the exact same rate-distortion perfor- 
mance, as well as the same implementation complexity characteristics, as the GVW. Let: 

7 = min{l, 2{R{D/2) - R{D))]. 



Theorem 2. [HYB Compression/Complexity Trade-off] Consider a memoryless source 
with rate-distortion function R{D), which is to be compressed at target distortion level D G (0, -Dmax)- 
There exists an e > such that, for any < e < e, the HYB algorithm with parameters < 7 < 7 and 
i as in (fT2|) achieves a rate o( R = R{D) + 7 bits/symbol, its expected distortion is less than D + e, 
and moreover: 

- Encoding time per source symbol is proportional to (Ai/e)^^'-^^'''^ , 

- Decoding time per symbol is independent of 7 and e, 
where Ai and A2(-C') are independent of e and 7. 

Remarks. 

1. Theorem 2 is an exact analog of Theorem 1 proved for GVW in [l^, the only difference being 
that we consider average distortion instead of the probability-of-excess distortion criterion. The reason 
is that, instead of presenting an existence proof for an algorithm with certain desired properties, here 
we examine the performance of the HYB algorithm itself. Indeed, the proof of Theorem 2 can easily 
be modified to prove the stronger claim that there exists some instance of the random database Y^^ 
such that, using that particular database, the HYB algorithm also has the additional property that 
the probability of excess distortion vanishes as n ^ 00. The same comments apply to Theorem 3 
below. 

2. In 



19|] a similar result is proved with the roles and e and 7 interchanged. In fact, it should be 



pointed out that the scheme we call "the" GVW algorithm here corresponds to the scheme used in 
the proof of [l^. Theorem 1]. A slight variant (having to do with the choice of parameter values and 
not with the mechanics of the algorithm itself) is used to prove [l^, Theorem 2]. Having gone over 
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the proofs, it would be obvious to the reader that, once the corresponding changes are made for HYB, 
an analogous result can be proved for HYB. The straightforward but tedious details are omitted. 

3. In terms of memory, the GVW scheme requires £[2^'^J reproduction symbols for storing the 
codebook, while using the same memory parameters the HYB algorithm needs m{£) = [2^^J + £ — 1 
symbols. The ratio between the two is, 

memory for GVW _ ^[2^^] _ ^ 
memory for HYB ~ [2^^J + £-1 ^ ' 

so that the GVW needs ~ £ times more memory than HYB. Moreover, the closer we require the 
algorithm to come to achieving an optimal {D,R(D)) point, the smaller the values of e and a need 
to be taken in Theorem 2, and the larger the corresponding value of £; cf. equation (|12p . Therefore, 
not only the difference, but even the ratio of the memory required by GVW compared to HYB, is 
unbounded. 

The next result shows that, choosing the parameters £ and 7 in HYB appropriately, optimal 
compression performance can be achieved with linear decoding complexity and near-linear encoding 
complexity. It is a parallel result to fiol. Theorem 3]. 

Theorem 3. [HYB Near-Linear Complexity] For a memoryless source with rate-distortion 
function R{D), a target distortion level D G (0, Z^max); and an arbitrary increasing and unbounded 
function g{n), the HYB algorithm with appropriately chosen parameters £ = i{n) and 7 = 7(n), 
achieves a limiting rate equal to R{D) bits/symbol and limiting average distortion D. The encoding 
and decoding complexities are 0{ng{n)) and 0{n) respectively. 

The actual empirical performance of HYB on simulated data is compared to that of GVW and 
LLZ in the following section. 
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4 Simulation results 



Here the empirical performance of the HYB scheme is compared with that of GVW and LLZ, on three 
simulated data sets from simple memoryless sources H The following parameter values were used in all 
of the experiments. For the GVW and HYB algorithms, (. was chosen as in [l^] to be £ = [-^^^p^] , 
where R{D) is the rate-distortion function of the source, and 7 was taken equal to 0.002. Similarly, 
for LLZ we took i = \22/R{D)~\ , 7 = 0.03 and a = 0.1. Note that, with this choice of parameters, the 
complexity of all three algorithms is approximately linear in the message length n. All experiments 
were performed on a Sony Vaio laptop running Ubuntu Linux, under identical conditions]^ 

First we revisit the example of Section n = 1050 bits generated by a Bernoulli source with 
parameter p = 0.4, are compressed by all three algorithms at various different distortion levels with 
respect to Hamming distortion. Figure [2] shows the rate-distortion pairs achieved. 




Figure 2: Comparison of the rate-distortion performance of GVW, LLZ and HYB on a data string of length n = 1050 
bits generated from a BernouUi source with parameter p — 0.4. The sohd convex curve is the rate-distortion function; 
the rate-distortion pairs achieved by GVW are shown as blue diamonds; by LLZ as red stars; and by HYB as bold green 
dots. 

Rate-distortion performance. It is evident that the compression performance obtained by GVW 
and HYB is near- identical, and better than that of LLZ. This example was also examined by Rissanen 



and Tabus in [41[, where it was noted that it is quite hard for any implementable scheme to produce 



rate-distortion pairs below the straight line connecting the end-points {D, R{D)) of the rate-distortion 



^We do not present comparison results with earlier schemes apart from the GVW, since extensive such studies already 
exist in the literature; in particular, the GVW is compared in with the algorithms proposed in [4^ . [l8l | and [4l| . 

^Alth oug h there is a wealth of efficient algorithms for the problem of approximate string matching (see, e.g., 
[ill2 H M and the references therein) , since HYB clearly outperforms LLZ, our version of the LLZ scheme was imple- 
mented using the naive, greedy scheme consistent with the definition of algorithm. 
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curve corresponding to = and D = 0.4. As noted in 19|], the Rissanen- Tabus scheme produces 
results shghtly below the straight line, and it is one of the best implementable schemes for this problem. 



Bern(0.4) source, Hamming distortion 




Performance parameters 


/a Inn nr't t hi fyi 
I\Lyul LLiLllL 


-C^targct 




rate 


memory 


time 


HYB 


0.05 


0.06952 


0.70095 


0.79MB 


2m45s 


HYB 


0.08 


0.11238 


0.59143 


0.6MB 


3m06s 


HYB 


0.11 


0.12952 


0.50381 


0.59MB 


3m33s 


HYB 


0.14 


0.15714 


0.41619 


0.56MB 


4m06s 


HYB 


0.17 


0.19143 


0.32857 


0.52MB 


4m40s 


HYB 


0.2 


0.22095 


0.26286 


0.53MB 


5m21s 


HYB 


0.23 


0.23905 


0.21905 


0.51MB 


5m26s 


HYB 


0.26 


0.27048 


0.15333 


0.53MB 


6m27s 


HYB 


0.29 


0.29333 


0.10952 


0.53MB 


6m56s 



Table 2: Performance achieved by the HYB algorithm on a data string of length n — 1050 bits generated from a Bernoulli 
source with parameter p — 0.4. 

Memory and complexity. Tables [1] and [2] contain a complete listing off all performance parameters 
obtained in the above experiment, including the execution time required for the encoder and the total 
amount of memory used. As already observed in Section [2l the LLZ scheme requires much less memory 
that GVW, and so does the hybrid algorithm HYB. In fact, while GVW and HYB produce essentially 
identical rate-distortion performance, the HYB algorithm requires between 32 and 213 times less 
memory than GVW. [Note that these figures are deterministic; the memory requirement is fixed by 
the description of the algorithm and it is not subject to random variations produced by the simulated 
data.] In terms of the corresponding execution times, the GVW and HYB share the exact same 
theoretical complexity in their implementation. Nevertheless, because of the vastly different memory 
requirements, in practice we find that the execution times of HYB were approximately three to ten 
times faster than GVW. 

The second example is again on a Bernoulli source with respect to Hamming distortion, this time 
with source parameter p = 0.2. The corresponding simulation results are displayed in Figure [3] and 
Table El 

Finally, in the third example {^n} is taken as a memoryless source uniformly distributed on 
{0, 1, 2, 3}, to be compressed with respect to Hamming distortion. The empirical results are shown in 
Figure H and Table H 

In both these cases, the same qualitative conclusions are drawn. The rate-distortion performance of 
the GVW and HYB algorithms is essentially indistinguishable, while the compression achieved by LLZ 
is generally somewhat worse, though in several instances not significantly so. In the second example 
note that the memory required by HYB is smaller than that of GVW by a factor that ranges between 
44 and 242, while in the third example the corresponding factors are between 16 and 218. And again, 
although the theoretical implementation complexity of GVW and HYB is identical, because of their 
different memory requirements the encoding time of HYB is smaller than that of GVW by a factor 
ranging between approximately 3 and 9 in the second example, and between 1.25 and 1.5 in the third 
example. 
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Figure 3: Comparison of the rate-distortion performance of GVW, LLZ and HYB on a data string of length n = 1050 
bits generated from a Bernoulli source with parameter p = 0.2. The solid curve is the rate-distortion function; the 
rate-distortion pairs achieved by GVW are shown as blue diamonds; by LLZ as red stars; and by HYB as bold green 
dots. 




Figure 4: Comparison of the rate-distortion performance of GVW, LLZ and HYB on a data string of length n = 1050 
symbols generated from the Uniform source on {0,1,2,3}. The solid curve is the rate-distortion function; the rate- 
distortion pairs achieved by GVW are shown as blue diamonds; by LLZ as red stars; and by HYB as bold green dots. 
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Bern(0.2) source, Hamming distortion 





Performance parameters 


Algorithm 


-^target 


^achieved 


rate 


memory 


time 


GVW 


0.04 


0.05429 


0.50381 


25MB 


19m05s 


GVW 


0.055 


0.07048 


0.4381 


28MB 


18ml3s 


GVW 


(J.(J7 


A AO o 'V 

0.08857 


A OTOOO 


ooMB 


1 A.™ Kf\^ 


V W 


n AO c 

0.085 


0.10476 


A OOOCT 

U.ozooT 


4zMr> 


OA..™ 1 A ^ 

20ml4s 


Lr V W 


A 1 

O.i 


U.izTbz 


0.28476 


A ni\ /TO 




Lj V W 


A 1 1 CT 

0.ii5 


A 1 0001 


A A1 A AC 

0.21905 


C Al\ /TTD 


O A.„^ AO.-. 

20m03s 


Lr V W 


A 1 O 


A 1 A o c: '7 

0.12857 


0.17524 


7oMJ3 


19m57s 


It V \ \ 


U.i io 


U.i h) 1 i 


u.i -J 3 33 




iymU8s 


GVW 


O.lb 


0.16286 


0.10952 


126MB 


1 r\ o o^ 

19m38s 


LLZ 


0.04 


0.0381 


0.64495 


1.36MB 


3m05s 


LLZ 


0.055 


0.05048 


0.59165 


2.02MB 


7m45s 


LLZ 


A A*? 

U.U7 


A A/^ o T 

0.06857 


A C /I OO/? 

U.5450D 


1.9MB 


C™, AO^ 

8m02s 


LLZ 


U.Uoo 


CI A O O O "1 

U.U8.)8i 


U.oUuiu 


2.4i\iiJ 


iom38s 


T T 7 

LLZ 


0.1 


0.09714 


0.42154 


3.1MB 


OO 1 o 

22ml8s 


LLZ 


0.115 


0.11619 


0.3083 


5.2MB 


24m03s 


LLZ 


0.13 


0.13048 


0.26809 


8.3MB 


58m07s 


T T 
LLZ 


0.145 


0.1485 ( 


U.2U223 


oi iv/r"D 


132m30s 


LLZ 


O.lu 


U.i6-j / i 


U.J i(2 


iUURiB 


3 / /miUs 


HYB 


0.04 


0.05429 


0.50381 


0.56MB 


2m02s 


HYB 


0.055 


0.07048 


0.4381 


0.53MB 


2m54s 


HYB 


0.07 


0.08952 


0.37238 


0.57MB 


3m32s 


HYB 


0.085 


0.08286 


0.32857 


0.58MB 


3m52s 


HYB 


0.1 


0.12 


0.28476 


0.57MB 


4m46s 


HYB 


0.115 


0.12857 


0.21905 


0.56MB 


5m21s 


HYB 


0.13 


0.13143 


0.17524 


0.55MB 


5m45s 


HYB 


0.145 


0.14286 


0.15333 


0.52MB 


6m30s 


HYB 


0.16 


0.17429 


0.10952 


0.52MB 


7mlls 



Table 3: Comparison of the performance of GVW, LLZ and HYB on a data string of length n = 1050 bits generated 
from a Bernoulli source with parameter p = 0.2. 
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?7{0, 1,2, 3} source, Hamming distortion 




Performance parameters 


Algorithm 


^target 


^achieved 


rate 


memory 


time 


GVW 


0.1 


0.1419 


1.41714 


43MB 


10m27s 


GVW 


0.16 


0.20095 


1.16095 


24MB 


6m44s 


GVW 


U.ZZ 




n no 
U.9z 


O 1 IV /TO 


8ml9s 


CjV W 


U.zo 


(J.olooo 




44Mr> 


llml2s 


Cji V W 


0.34 


0. 36762 


0.56952 


A OA /TTD 


9m45s 


Lr V W 


0.4 




(j.4ioiy 


CIV /TTD 


1 0,^ or\„ 

12m29s 


Cj V w 


0.4d 


0.47/30 


0.30667 


noi\ /TO 


13m59s 


CtV w 


0.u2 




n 1 1 1 

(j.iy i-i 


10 1 A r"n> 


llmlzs 


GVW 


0.58 


0.58952 


0.10952 


229MB 


17m30s 


LLZ 


0.1 


0.06857 


1.97778 


3.597MB 


9m54s 


LLZ 


0.16 


0.1381 


1.53794 


1.79MB 


7m46s 


LLZ 


(J.zz 




1 OC /I 

1.254d1 


2.04MB 


IzmoUs 


T T 'V 

LLZ 


U.zb 






/.ulRiB 


icMlloiS 


LLZ 


n O /I 

0.34 


O O r O /I 

0.33524 


0.76228 


3.445MB 


00 or 

28m25s 


LLZ 


0.4 


0.4019 


n r or\o 

0.5393 


/I riA /n~> 

3.49MB 


or* 

30m37s 


XT'? 

LLZ 


0.46 


n /I o o 1 

0.46381 


n o orio 

0.3893 


C /I /I A /TT) 

5.44MB 


46ml9s 


T T 
LLZ 


0.52 


0.52571 


0.25807 


1 /( /r"D 


105m56s 


LLZ 


U.-Jb 




U.J / -1 / •) 


lU IMB 


uziulbs 


HYB 


0.1 


0.1419 


1.41714 


2.58MB 


7m49s 


HYB 


0.16 


0.19714 


1.16095 


1.22MB 


5m06s 


HYB 


0.22 


0.25429 


0.92 


1.26MB 


6m37s 


HYB 


0.28 


0.30762 


0.72286 


1.39MB 


8m48s 


HYB 


0.34 


0.37238 


0.56952 


1.05MB 


7m42s 


HYB 


0.4 


0.42095 


0.41619 


1.18MB 


9m39s 


HYB 


0.46 


0.47143 


0.30667 


1.15MB 


10m34s 


HYB 


0.52 


0.52952 


0.19714 


1.01MB 


10ml4s 


HYB 


0.58 


0.58476 


0.10952 


1.05MB 


llm43s 



Table 4: Comparison of the performance of GVW, LLZ and HYB on a data string of length n = 1050 symbols generated 
from the Uniform source on {0, 1, 2, 3}. 
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5 Conclusions and Extensions 



The starting point for this work was the observation that there is a certain duahty relationship 
between the divide-and-conquer compression schemes of Gupta- Verdii-Weissman (GVW) in [19], and 
certain lossy Lempel-Ziv schemes based on a fixed-database as in [2J]. To explore this duality, LLZ, 
a new (non-universal) lossy LZ algorithm was introduced, and it was shown to be asymptotically 
rate-distortion optimal. To combine the low-complexity advantage of GVW with the low-memory 
requirement of LLZ, a hybrid algorithm, called HYB, was then proposed, and its properties were 
explored both theoretically and empirically. 

The main contribution of this short paper is the introduction of memory considerations in the 
usual compression-complexity trade-off. Building on the success of the GVW algorithm, it was shown 
that the HYB scheme simultaneously achieves three goals: 1. Its rate-distortion performance can be 
made arbitrarily close to the fundamental rate-distortion limit; 2. The encoding complexity can be 
tuned in a rigorous manner so as to balance the trade-off of encoding complexity vs. compression 
redundancy; and 3. The memory required for the execution of the algorithm is much smaller than 
that required by GVW, a difference which may be made arbitrarily large depending on the choice of 
parameters. 

Moreover, empirically, for blocklengths of the order of thousands, the HYB scheme appears to out- 
perform existing schemes for the compression of simple memoryless sources with respect to Hamming 
distortion. 

Lastly, we briefly mention that the results presented in this paper can be extended in several 
directions. First we note that the finite-alphabet assumption was made exclusively for the sake of 
simplicity of exposition and to avoid cumbersome technicalities. While keeping the structure of all 
three algorithms exactly the same, this assumption can easily be relaxed, at the price of longer, more 
technical proofs, along the lines of arguments, e.g., in [49] [24] [25] [19]. For example. Theorem 4 of [l^ 
which gives precise performance and complexity bounds for the GVW used with general source and 
reproduction alphabets and with respect to an unbounded distortion measure, can easily be generalized 
to HYB. Similarly, Theorem 5 of [l9| which describes the performance of a universal version of GVW 
can also be generalized to the corresponding statement for a universal version of HYB (with obvious 
modifications), although, as noted in [l9(], the utility of that result is purely of theoretical interest. 
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Appendix 



Proof of Theorem 1. 

Recall that, under the present assumptions, the rate-distortion function R{D) is continuous, differ- 
entiable, convex and nonincreasing Given D € (0, Dmax) and (5 > 0, assume without loss of 

generality that D + 5 < -Dmax; then we can choose 7 > according to R{D + 5) = R{D) — 7/2, so that 
D = D + 5. [As it does not change the asymptotic analysis below, we take a > fixed and arbitrary.] 
Then the distortion part of the theorem is immediate by the construction of the algorithm. 

Before considering the rate, we record two useful asymptotic results for the match-lengths Li^k- 
Let R = R{D) + 7, and m = m{i) = [2^^J + £ - 1 as in ([3]). Then pj, Theorem 23] immediately 
implies that, 

w.p.l. 



e^oo log m[i) 



R{D) 



Moreover, for any e > 0, the following more precise asymptotic lower bound on L^^i holds: As i ^ 00, 



(logm(£))Pr^L,i<i^e^ 



w.p.l. 



(9) 



The proof of ([9]) is a straightforward simplification of the proof of [24, Corollary 3], and therefore 
omitted. 

Now let e > arbitrary. The encoder parses the message X" into Hi distinct words Z^^\ each of 
length -L^^fc. We let = (iog m{tj) / {R{D) + e) and following [48] we assume, without loss of generality, 
that is an integer and that the last phrase is complete, i.e., 

Z(n^) has length L^^n^ 

To bound above the rate obtained by LLZ, we consider phrases of different lengths separately. We 
call a phrase Z^^'^ long if its length satisfies L^_fc > A^; and we call Z*^'^) short otherwise. Recalling ([6]), 
the total description length of the LLZ can be broken into two parts as, 

AW) < E [riog((l + aK)l + [L,,fclog|in 



k: ZC") is short 

+ E 

k: is long 



[log((l + a)^)] + [logm] 



(10) 



For the first sum we note that, by the choice of m{i) and the definition of a short phrase, each summand 
is bounded above by a constant times A^, at least for all i large enough; therefore, the conditional 
expectation of the whole sum given Xf is bounded by. 




o<A'} 



X^ \ < C2 log m{i) n Pr <| < ^}^^ 



R{D)+e 



where Ip denotes the indicator function of an event F, and the inequality follows by considering not 
just all /c's, but all the possible positions in A" where a short match can occur. Dividing by n and 
letting n — > 00, from ([9]) we get that this expression converges to zero w.p. 1, so that the conditional 
expectation of the first term in (llOp also converges to zero, w.p.l. 
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For the second and dominant term in (fTO]) . let 11^ be the number of long phrases Z^''\ Since each 
long Z^*^) has length L^^k ^ -^j we must have NH'^^ < n, so that 

n ~ logm[£) 

Also, by the definition of 171(1), for all i large enough (independently of n), we have, 

log((l + a)^) < elogm(£). 
Therefore, the second sum in pUj) can be bounded above by, 

n^(l + e)logm(^) < n{l + e){R(D) + e). 
Combining this with the fact that the first term in (|1U|) vanishes, immediately yields 

(A{X^,D,e) 



lim suplim sup E 



< {R{D)+e){l+e) w.p.l. 



and since e > was arbitrary we get the first claim in the theorem. Finally, the second claim follows 
from the first and Fatou's lemma. □ 

Proof of Theorem 2. 

The proof of the theorem is based on Lemma 1 below, which plays the same role as [igl . Lemma 1] 
in the proof of jlil . Theorem 1]. The rest of of the proof is identical, except for the fact that we do 
not need to invoke the law of large numbers, since here we do not claim that the probability of excess 
distortion goes to zero. □ 

Before stating the lemma, we define the following auxiliary quantities: Di = D/2, K{D) = 
{D-Di)/{R{Di)-R{D)), 



l8L>niax''32(i?'(Z)/2)L>^ax)2 

and, 



e = mm • 



rexp{16C(0)) I 



Lemma 1. Consider a memoryless source {^n} to be compressed at target distortion level D £ 
(0, -Dmax)- Then for any < e < e, the HYB algorithm with parameters < 7 < 7 and 



c{D)r 



1 , 3(Z?max - D) 
-^log 



(12) 



when applied to a single block achieves rate R = R{D) +7, and its expected distortion is less than 
D + e. 

Proof. Given e > 0, choose a positive e' < e such that, 

log < e. 



C{D) 
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Now follow the proof of [igl . Lemma 1] with e' in place of e, until the beginning of the computation 
of the probability of excess distortion. The key observation is that, for HYB, this probability can be 
bounded above by the excess-distortion probability with respect to a random codebook with 

words, by just considering possible matches starting at positions i = 1, £ + 1,2£ + 1, . . ., making the 
corresponding potentially matching blocks in the database independent. Therefore, following the same 
computation, the required probability can be bounded above as before by, 

2^^-ec(Dh')+l2-'^/\ (13) 

The first term is bounded above by, 

2e' 



as before, and in order to show that the expected distortion is less than e it suffices to show that the 
last term satisfies, 

{D^^, - Z))^2^(«(^)+^) < e/3. (14) 
Substituting the choice of i from (jl2p . it becomes. 



■log ( 



and since 7 is restricted to be less than one, this can in turn be bounded above, uniformly in 7 G (0, 1), 
by its value at 7 = 1. [To see that, note that the function /(x) = Ax^ exp{—Bx} is increasing for 
X < 2/B and decreasing for x > 2/B. By our choice of e, the maximum above is achieved at the point 
X = 1/7 = 1.] Therefore, noting also that 4C(D) < 1, this term is bounded above by. 



c{D) ( 7' 



max 



which, after some algebra, simplifies to. 



log 



3C{D) 



and this is less than e/3 by the choice of e'. This establishes p4p and completes the proof of the 
lemma. □ 



Proof of Theorem 3. 

Taking c > arbitrary, we let, as in the proof of [l^. Theorem 3], 



e{n) 



logg(^) ' 
R{D) + c 



and 7(^1-) 



'log^(n) 
e{n) 



For each n we use HYB with the corresponding parameters; the rate result follows from the construc- 
tion of the algorithm, which, at blocklength n, has rate no larger than. 



R{D) + 7(n) R{D) bits/symbol. 
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as n — > oo. 

Regarding the distortion, equation (jl3p in the proof of Theorem 2 shows that that the probabihty 
of the event that the distortion of the ith block will exceed D is bounded above by, 

2(2-^WC(^)7(")") _^ £(n)2-^(")'^(")/^. 
It is easily seen that, for large n, this is dominated by the second term, 

£(n)2"(^/^V^(") i°g^W. 
Therefore, the distortion of any one £-block is bounded above by, 

D + Dmax^(n)2-(l/4)Vn«)l°g^(«). 

Noting that the excess term goes to zero as n — > oo, it will still go to zero when averaged out over all 
n/i{n) sub-blocks, and, therefore, the expected distortion over the whole message X" will converge 
to D. 

Finally, the complexity results are straightforward by construction; see the discussion in [19, Sec- 
tion II- A]. □ 
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